DSRC 2—Industry-oriented compression of FASTQ files

نویسندگان

چکیده

برای دانلود رایگان متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

LFQC: a lossless compression algorithm for FASTQ files

MOTIVATION Next Generation Sequencing (NGS) technologies have revolutionized genomic research by reducing the cost of whole genome sequencing. One of the biggest challenges posed by modern sequencing technology is economic storage of NGS data. Storing raw data is infeasible because of its enormous size and high redundancy. In this article, we address the problem of storage and transmission of l...

متن کامل

Compression of Unicode Files

The increasing importance of Unicode for text files, for example with Java and in some modern operating systems, implies a possible doubling of data storage space and data transmission time, with a corresponding need for data compression. However it is not clear that data compressors designed for 8-bit byte data are well matched to 16-bit Unicode data. This paper investigates the compression of...

متن کامل

Compression of DNA sequence reads in FASTQ format

MOTIVATION Modern sequencing instruments are able to generate at least hundreds of millions short reads of genomic data. Those huge volumes of data require effective means to store them, provide quick access to any record and enable fast decompression. RESULTS We present a specialized compression algorithm for genomic data in FASTQ format which dominates its competitor, G-SQZ, as is shown on ...

متن کامل

Compression of FASTQ and SAM Format Sequencing Data

Storage and transmission of the data produced by modern DNA sequencing instruments has become a major concern, which prompted the Pistoia Alliance to pose the SequenceSqueeze contest for compression of FASTQ files. We present several compression entries from the competition, Fastqz and Samcomp/Fqzcomp, including the winning entry. These are compared against existing algorithms for both referenc...

متن کامل

Generation of Artificial FASTQ Files to Evaluate the Performance of Next-Generation Sequencing Pipelines

Pipelines for the analysis of Next-Generation Sequencing (NGS) data are generally composed of a set of different publicly available software, configured together in order to map short reads of a genome and call variants. The fidelity of pipelines is variable. We have developed ArtificialFastqGenerator, which takes a reference genome sequence as input and outputs artificial paired-end FASTQ file...

متن کامل

ذخیره در منابع من

ذخیره در منابع من قبلا به منابع من ذحیره شده

{@ msg_add @}

با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

ژورنال

عنوان ژورنال: Bioinformatics

سال: 2014

ISSN: 1460-2059,1367-4803

DOI: 10.1093/bioinformatics/btu208